Extending RapidMiner with Data Search and Integration Capabilities
نویسندگان
چکیده
Analysts are increasingly confronted with the situation that data which they need for a data mining project exists somewhere on the Web or in an organization’s intranet but they are unable to find it. The data mining tools that are currently available on the market offer a wide range of powerful data mining methods but hardly support analysts in searching for suitable data as well as in integrating data from multiple sources. This demo shows an extension to RapidMiner, a popular data mining framework, which enables analysts to search for relevant datasets and integrate discovered data with data that they already know. In particular, we support the iterative extension of data tables with additional attributes. We will demonstrate the usage of the extension with a large corpus of tabular data extracted from Wikipedia.
منابع مشابه
Processing Data Streams with the RapidMiner Streams Plugin
In various applications we face a plethora of data that is often growing continuously. Such data arize in monitoring settings such as server log files, manufacturing processes, sensor networks or high volume news feeds such as twitter. Analysis of such data is different to the traditional batch setting that RapidMiner initially has been designed for. In this work we present the streams library ...
متن کاملSharing RapidMiner Workflows and Experiments with OpenML
OpenML is an online, collaborative environment for machine learning where researchers and practitioners can share datasets, workflows and experiments. While it is integrated in several machine learning environments, it was not yet integrated into environments that offer a graphical interface to easily build and experiment with many data analysis workflows. In this work we introduce an integrati...
متن کاملCombining RapidMiner operators with bioinformatics services – a powerful combination
Knowledge discovery through pattern finding in data is central to modern molecular biology, which now has thousands of databases and similar numbers of tools for processing those data. Any data analysis in molecular biology involves gathering and processing data from many sources, even before the analysis for the central biological question takes place. Taverna is a workflow workbench that allo...
متن کاملMining the Web of Linked Data with RapidMiner
Lots of data from different domains is published as Linked Open Data (LOD). While there are quite a few browsers for such data, as well as intelligent tools for particular purposes, a versatile tool for deriving additional knowledge by mining the Web of Linked Data is still missing. In this system paper, we introduce the RapidMiner Linked Open Data extension. The extension hooks into the powerf...
متن کاملRadoop: Analyzing Big Data with RapidMiner and Hadoop
Working with large data sets is increasingly common in research and industry. There are some distributed data analytics solutions like Hadoop, that offer high scalability and fault-tolerance, but they usually lack a user interface and only developers can exploit their functionalities. In this paper, we present Radoop, an extension for the RapidMiner data mining tool which provides easy-to-use o...
متن کامل